y^OCtllR BBS IBB 



■» 167 605 



fl 868 *00 



AOTHOF 
TITLE 
POB D1TE 
HOTE 



Lai , Horris K. 

Consumers Guide to Idccaticnal Evaluatict. 
27 Bar 78 

20p. ; Paper presented at the Annual Beeting of the 
American Edocational Research Association (62nd , 
Toronto, Ontario, Canada, Barch 27-31, 1978) 



ED BS PBICE 
DESCRIPTORS 



BF-S0.83 HC-11.67 T>las Postage. 
Ad^i**4s£xator Guides; ♦Educational Assessment; 
Educational Testing; Etplcjment Cualif ications; 
♦Estimated Costs; Evaluation Bethods; *'E valuation 
Meeds; Evaluators; ♦Guidelines; Ercgra^ Descriptions; 
♦Program Evaluation; Testing Protlems < 



ABSTRACT 

Although auch has been written about educational 
evaluation, fen guidelines exist for consumers — prtject directors, 
school administrators, curriculum developers, legislators, teachers, 
parents, and boats of education. Several cautiens surface froa a 
review of the litprature. First, tests that art tased cn prcgraa 
oblectives are aost useful to evaluat ices. Such object! v^s-based or 
criterion referenced tests often must be created by the evaluation 
staff, but test construction is difficult, lengthy, and ccstly. 
Second, qoals or objectives themselves Bust be evaluated before 
proqraa planni^. Third, program design should te described in detail 
and fourth, consumers should knov how tc select qualified evaluators. 
Eapirical cost data were collected on completed evaluations whose 
budqets ranged froa $400 to S3 million. The data were studied to 
obtain estimates of costs expected in relation to sample size, number 
of schools involved, number of test items developed and used* report 
lenqth, and project staff time. Although a more thorough 
investigation is needed, these preliminary results should furnish a 
basis for estimating the cost of a proposed evaluation cr deteriining 
how auch evaluation can be ^one for x amcunt of dollars.. (CP) 
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' I . Introduction K * V v / f 

Recently several books nave been published on educational evaluation 
' (e.g. , -Rippey , 1973; Wortfjen; Slanders , 1973; Anderson, Ball & Murphy, 1-974; 
Bori<tf, 1974; Popham, 1974^ tfat'berg, 1974; Yelon, ?974; CooleV V Lohnes , 1976; 
Glass, 1976; Guttentag, 1.97-7) ; --however, very little has been written to assist 
the actual consumer of evaluation. This situation is understandable in light 
of the fact that contemporary educational Evaluation is a relatively new 



discipline that many feel had its start with the publication of Scriven's 
1967 paper entitled "The methodology of evaluation," or perhap§S>onbach ' s 
1963 article, "Course improvement through eval uati oh . '• evaluation writings 



that have followed, however, have been addressed' mMlily to, practicing, * - 

;j aspiring, or forced-to-do-it evaluated, and little has been written to help 
the consumer of evaluation who does not actually dp the evaluation. Included 
:n this category of consumers are project directors, school administrators, 
curriculum developers, legislators, teachers, parents, and boards 6f educa- 
tion. Although the concerns of this paper are not di rectly Addressed* to * 

evaluators, improved communication between consumer and eval ua tors- requires % 

fa 

that both groups be aware of the many problem areas.- 

In recognition of the fact that the typical consumer of evaluation has ^ 
;ery little time to devote to evaluation matters, this paper will be kept as 
f snort as feasible. References are given with the realization that very few 

wi 1 1 tfeVearched o^ and read; however, some consumers may want to go deeper 7 
into certain areas or be able to converse in a ;nore informed manner with their \ 
evaluators. The overall goal of this paper is to assist the consumer in / 
his/her understanding of what evaluators can or cannot do and how best to 
mane jse of evaluation. . % 
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II. State of the art of evaluation „ 
V A. Evaluation models and designs 

As a relatively new field, evaluation has developed a jargon of its own. 

A 

For the consumer who encounters technical evaluation terms and jargon, 
evaluation glossaries hay* been produced by Anderson, Ball, ancTMurphy (1973), 
the California Program Evaluation Improvement Project (1975)', and Scrtyen 



and Roth (1977). " . 

There is no standard approach to evaluation. Afthough MveV^l .eval uation 
models or schemata have been proposed, there is virtually no evidence atloiit 
the relative efficacy of the many models (Worthen, 1972; Smith & f^rray, 1974). 
The consumer, however, need not blindly accept any model. Instead he/she- 
can, as a start, look at Worthen and Sanders' < 1 973} summary of the strengths 
and weaknesses of eight of the major mtfdels* or ask the evaluator to explain 
why a given mode? was selected. Another fruitful endeavor would be for the 
consumer to look at^the evaluation design or plan created by the evaluator. 
Criteria and guidelines to evaluating evaluations (and evaluation designs) 
have been presented by Stufflebeam et al . (1971 ), Scriven (1974), and Sanders 
and Nafziger (1975). The checklists by Scriven (1974) can be used to carry 
out evaluations as well as assist in meta-eval uations (the evaluation of 

evaluation plans). 'Ideally an evaluation plan would cover all of the 1(3 areas 

. / ' i 

described by S(Jfriven (needs, rftarke.t, true fiehl trials, true consumer, cri- 
tical competitors, long term, side effect*, process, causation, statistical 
significance, overall significance, costs, extended support). In actuality 
very few evaluations cover all of the areas adequately. 

"he evaluation plan or design is just the start of an arduous process 
that often brings into conflict the idealism of the plan and the realities of 

\ f _____ * , 

^ Those due Alkin ~ Hammond, Personal Judgment, Provus , Scriven, Stake, 
l Stuff T"ebean\ and TyVer. , 
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operating 1n an educational environment. There are at least two major reasons 
why an evaluation design that is sound might nonetheless be followed by poor 
evaluation: 1) The measures- us^fd are invalid, 2) The educational environment 
interferes with the plans. In ^e first- case, many consumers insist on 
'the use of norm-referenced, standarcfized achievement tests. For most evalua- 
tions such testa are inappropriate because of invalidity in the form of 
overgenerality, nonoverlapping of objectives, bias, low reliability for 
individual test scores , .-questionable norms, confusing directions, and poor 
items. Organizations like the National Education Association (1973) and 
the National Advisory Committee on Mathematical Education (1975) have cri- 
ticized the use of standardized testing. * 

3. Tests 

A recent study (Lai, 1975) on the Stanford Achievement Test used in 

Hawai'i showed that the usual demographic (socioeconomic status) variables 

like family income or parents' education accounted for 8j6% of the variance 

in test scores, thus leaving very little that could be accounted for by, say, 

a program that was being evaluated. The consumer who, on the other hand. 

? ♦ 
wants to know how a certain group of children is doing in relation to the 

rest of the United States is asking for* a basically impossible comparison. 

If, for example, one is interested in comparing Haw|j 1 i fourth graders with 

fourth graders in general, and he/she uses "Standardized achievement tests, 

then the comparison is invalid unless at least t\j£ following hold true: 

1) Hawai'i' s socioeconomic status is similar toithat of the nation as a 

whole, or more correctly, that of the group used to establish the norms 

(there exists substantial evidence that this is not true), 2) the test does 

not measure objectives that are. considered unworthy by a major group of 

consumers (this is ^'moortant because for some subtests, only one or two 



-Hems can constitute a year's difference In grade equivalents 1 ), 3) test 

administration was of the same caliber as that used when the normlng was 

carried out (there exists evidence that this is generally unti^re). 

Tests that are based on program objectives are more useful to evalua- 

tions. Such objectives-based (similar terms, are "criterion-referenced" 

or "domain- referenced") tests often have to be created by the evaluation 

staff. If the staff is inexperienced (e.g., has not previously developed 

good tests), then the consumer should not expect quality tests that are 

valid, reliable, easy to administer, clear in directions, etc. Test r 

development is difficult, lengthy, costly (a discussion of cost guidelines 

is presented later in this paper), and not necessarily doabje. If competence, 

t 

time, or money is lacking, there is nttle hope. Pr^venti ve tact ics on the 

"\ / 

part of the consumer could take the form of careful hiring cfi evaluators 

(see Section III) and/or ensuring that the program staff be prepared to 

state clearly what they would accept as evidence of success ; pf a given 

program or product. Some idealists suggest that th£ test items be developed 

before any curriculum development takes place. 

The generally low quality of tests available has been recognized in the 

literature (CSE, 1972, 1974), and some recommendations have been made. In 

areas like attitude toward math, for example, it? has been rVsyrirnmend^d that 

reinventing the wheel be discouraged and use be made of what is already 

available (Aitken, 1976). Objectives banks (e.g., Instructional Objectives 

Exchange (I0X) or Westinghouse leaVning Corporation) with associated items 

have been created in an attempt to helo individuals select custom-made 

(based on objectives) measures- The consumer, however, is unlikely tp oe 

■» 

pleased with all of the items produced. Extensive evaluations of .all oub- 



Problems associated with the use of grade eauivalents have been written 
up elsewhere. In general consumers are advised to not use them. 

V 
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Ushed 'tests* have been made by the Center for the Study "of Evaluation at ' 
UCLA (1972, 1974)-. A competent evaluator would be general ly awa^/ of 
tests available and be able to find appropriate ones if any. If test con- 
struction is required, the consumer must allon time for pilot tests of the 
measures. Because of the costs and time involved, it is uSuallyvtJa^^o, ^ 
keep the number of tests being developed to a minimum. It i^ bitter to 
get reasonably good data in a few areas than a lot of qu^Vloijible data 
from many areas. Baker (197*) and Popham (19^). have located such an 

•approach to evaluation which calls for lean fbut important) data,. The 
leart^data evaluatidrt readily admits Tacit of^cJI^hensiveness while at the 
sarrje time defen.ds t^e worthwhileness of tol lectlfg^ood <J^ta x s|n a focused 
area. For some consumers and budgets , *this ^pproW^may be the only 
feasible way to go. 

1 Revisions, of pi 1 ot ^versions of tests are carried out in the form of 

item analyses. It is rather easy for the **onsumer to be overwhelmed by the 
technical end of item^analysis that uses .terms like alpha coefficient, 
biserial correlations, factor analysis, or discrimination index. Oftentimes 
the program staff leaves too much in t^hp hands of the external (outside 
of the program) test developer. In masters of test reliability or internal 
consistency, the aval uator/test develop is probably more qualified than 
the program staff; however, in terms of validity/ a content analysis, pre- 
ferably based on some theory, performed by the program staff is rcore likely 
to improve test validity than anything the evaluation staff does. Colwell 
(1 970) and Suttman (1 976) have amoh^sized this point in'^their writings. 

Although Scri^en ( 1 972b ) has advocated evaluation that is uncontami nated 
oy program goals, his goal-free approach *'s infreauently used. T he consumer 
should se sensitive to the oossibility of side-effects (not directly related 
to program 'goal s ) which are often unanticipated. In the past, side-effects 
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have sometimes^ turned out to be majc^r findings. {For example, the finding 
th^t tutors made substantial cogiyftive gains, whereas gains by tutees were 
minimal . ) 

C. Goals and objectives 

A contemporary approach to evaluation requires that goals or objectives 
themselves be evaluated since it is of little value to know that goals were 
or were not attained if the goals themselves are not worthwhile ones. The 
evaluation literature has suggestions on ways to do this, and program staff 
are likely to be involved (e.g., see "Delphi technique" in the Scriven and 
Roth (1977) and Anderson et al . (1973) glossaries). This aspect is often 
intertwined with needs assessment in that goals should be related to needs. 
According to Stake (1970) objectives should be treated as fallible data! 
More attention should be given to statements of priority. 

0. Program description 

Good evaluation also requires that the product or program be described 
in some detail, it is not enough just to test participants before and after 
their exposure to a product or program. If a teaching process is involved, 
it should be observed and analyzed in terms of fidelity or quality of treat- 
ment. The modern approach to evaluation is strong on not taking for granted 
that program differences exist merely because there are two different labels 

4 

(e.g., curriculum A and curriculum B)'. ChA^turs and Jones (1 973) have dis- 
cussed the risks of appraising such non-events (no real differences in 
orogram presentation) in evaluation. 

Evaluation must also take into account what characteristics the product 
or orogram users bring to the situation. Things like socioeconomic variables 
(e.g., family income, parents' education, etftoicity} and pretest cognitive 

\ 

\ 

s 



and affective (e.g., attitude) performance are often important to the inter- 
pretation <jf exit level performance. Where possible invasiolrof privacy is 
involved, the consumer and evaluator should be aware of the various Federal 
and local laws that apply (e.g., see Weinberger & Michael, 1977). Complex " 
laws, of course, also apply "to any contracts between consumer and evaluator. 

III. Selecting the evaluator 

All of the concerns discussed in Part II be£pme .painful ly academic if an 
^competent or lazy evaluator i* hired. A few studies have been done on what 
skills evaluators need (e.g., Bunda (1 973), Worthen '(1975)), but the consumer 
who must hire an evaluator will not be in a good position to know if the 
desired skills are possessed by the evaluator. The best recourse is for the 
consumer to obtain, before hiring, examples of the evaluator's previous 
work. The consumer can then perform a quick metaeval uation using the criteria 
or approaches mentioned in Part II of this paper. It would also be useful 
to interview prospective evaluators to discuss their evaluation methods and 
philosophies. If the prospective evaluator is inexperienced in the on-the- 
job sense, then the academic training must be scrutinized. Evaluation spe- 
cialization is offered at only a few universities or colleges. Phi Delta 

* 

Kappa (1976) has published a comprehensive description of existing evaluation 
training orograms . As a minimum an inexperienced evaluator should have had 
training in tests and measurements, statistics (preferably to the advanced 
levels of correlation, multivariate analysis, regression, etc.), research * 
design, curricufum design, and evaluation. He/she should be a decent writer 
as well as a good communicator. Scriven (1972a) has gone so tar as to suggest 
that a well -qualified evaluator is one who has all the skills known to 
humanity. 
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Gtttinq the evaluation done * 

The last obvious major pre-eval uation step is the contracting process. 
Again a checklist approach is available to help the consumer not only write 
up an evaluation contract but also decide whether pr not the evaluation 
should even be done (Wright & Worthen, 1975). 

In deciding how much money to set aside for evaluation /project directors 
and evaluators have had virtually no help from the literature. Rules^of 
thumb like "5* and 10% of the program budget should be set aside for evalua- 
tion" have been gi vehyn thout much rationalization; furthermore, these rules 
of thumb have usually referred to desired amounts rather than minimal 'amounts 
needed to do a decent job. 

A. Evaluation costs 

In order to make more rational decisions about evaluation budgets, the 
decision maker needs to know what he/she can -get for X amount of dollars. 
Here is where the professional evaluation literature, in gene^l , is lacking. 
Empirical cost data collected on completed evaluations whose budgets ranged 
^om $400 to S3 million were studied to obtain ball-park estimates of costs 
expected in relation to sample size, number of schools involved, number of 
instruments and items devel oped^and used, report length, and project staff 
time. Because of the tremendous variation in the scope and costs of the 
sample of completed evaluations, it was not possible to come up with easily 
used categories that related evaluation parameters to costs. . 

Instead it was decided that representative examples might be of some 
use to evaluation consumers. A more in-depth study of evaluation costs 
is currently being jndertaken and will be ^eDorted in a subsequent oape^. 

Some cautions are essentia' to the oroper i nte^c^etation of the following 
figures. First of all, dollar amounts represent contracted costs, althougn 



an attempt was also made to obtain Information on ''c6sts M 1n the form of 
labor or fadl 1 ties .provided by the funding agency Jr program being evalua- 
ted. Secondly, it was virtually 'impossible to compare the various evalua- 
tions in terms of amount .of travel required or level of evaluation staff. 
Thirdly, no claims are made as to the representativeness^ of the sample. 
Although many ty^ps of evaluation organizations are represented, the attempt 
at sampling has been, at best, casual. Finally, where profit was involved 
(as opposed to evajuations done by non-prom organizations), such monetary 
results were not available. The one thing that all the included evaluations 
do have in common is the fact that they all represent completed evaluations 
for which contracted funds are known. Although a much more thorough investi- 
gation is needed in this area, the preliminary results presented here 
should be of some use as a guideline for evaluation consumers, flafty of whom 
have virtually no basis for estimating the cost of their proposed ^evaluation 
or determining how much evaluation they can get done for X amount of dollars. 
At the very least they will now have some ball -park figures based on actual 
contracted and completed evaluations. They will also see that evaluations 
are often ^ather expensive although Scriven has suggested that v even very costly 
evaluations can, in essence, be cost-free if their benefits exceed their 
monetary costs . 
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B. Cosi estimates 

^/fhe original plan for collecting information on the costs of evalua- 
tions called for looking exclusively at completed contracts as opposed 
to proposal -type cost estimates. The rationale here was that consumers- 
would benefit more from actual . r^al -world figures than from proposed 
budgets *hich in actual bidding might turn, out to be either at the low 
or high extremes. As it turns out, however, experience with completed 
evaluation contract's has enabled some institutions or persons to be skilled 
at estimating costs for certain types of evaluations. 3ecause such esti- 
mates are also part of the real world in the sense that RFP's (Requests 
for proposals) often reflect those estimates, it was decided to include 
some discussion on the matter. 

The Department of Education In the State Of California has provided 

the following estimate for evaluation at the enumeration /level in which 

X 

data are collec tech >on matters /like funding history, program objectives, 
number of oarti cipants , budget, and other descriptive information. At 
tnis level nc participant or other product data are collected. Evaluation 
activities include a) analysis of legislation, b) preparation, field 
testing, and distribution of forms, c) tabulation of responses, and d) 
resort writeup. c or a sirrfple program with 1000 district responses, total 
cost for this type of in-house report would be around $15,000. 

"he Hawai ' i State Department of Education has found that evaluations of 
programs used statewide (there are approximately 12,000 pupils per grade), 
have a minimal cost of about $100,000. Statewide Title revaluations are 
exDectect to njn in the -Si 00, 300 to $1 25,000 neighborhood. 
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t « * 

Concl usion , 

Although thfs piper has not directly addressed the problem of reducing 

/ 

evaluation co*ts, it has lSld the foundation for aSiibsequent effort to 

- — . *• > 

promote evaluation cost efficiency. wfiat this paper hopes to reduce is ^ 

• t 

the percent of evaluation consumers who are basically unknowledgeable 
*abou>fa) ^f|at evaluation can or cannot do and b) how to have evaluations 
done by another party. As consumers become more knowledgeable, they will 
be better able to make use of evaluation findings. In terms of Hutchin- 
son's (1972) criteria for successful ness o^eval uati on , such an increase 
in evaluation use by the consumer would im ^y fJ 1 increase in the quality 
of evaluation, which in turn can lead to, an increase in the quality af 
the entities being evaluated. 
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